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Abstract 


Comparative  genome  analyses  of  the  Francisella  tularensis  subspecies  tularensis  and  subspecies 
holartica  populations  show  that  genome  content  is  highly  conserved  and  only  a  relatively  small 
number  of  genes  within  the  F.  tularemis  subsp,  tularemis  genome  are  abseni  in  oilier  F. 
tularensis  subspecies.  To  catalogue  differences  in  genome  organization  that  could  contribute  to 
the  unique  virulence  characteristics  and  geographic  distributions  of  the  tularensis  and  holartica 
subspecies,  we  have  used  Paired-End  Sequence  Mapping  (PESM)  to  identify  regions  of  the 
genome  that  are  non-conti guous  between  thse  two  subspecies.  Using  PESML  the  physical 
distances  between  paired  -end  sequencing  reads  from  a  library  of  a  vvildtype  reference  F. 
tularensis  subsp.  holurtica  strain  were  compared  to  the  predicted  lengths  between  the  reads 
based  on  map  coordinates 'of  the  reads  from  the  subsp.  tularemis  strain  Schu  S4  and  subsp. 
holartica  strain  LVS  genome  sequences.  A  total  of  17  different  continuous  regions  were 
identified  in  the  subsp.  holartica  genome  (CRh»i-.m.-,)  that  are  non-com iguous  in  the  subsp. 

tularensis  genome. . At  least  six  of  the  seventeen  different  are  positioned  as  adjacent 

pairs  in  the  subspecies  tularensis  genome  sequence  but  are  translocated  in  holartica.  implying 
that  arrangements  of  the  a  are  ancestral  in  the  tularensis  subspecies  and  derived  in 

holartica. . Using  nested  PCR  assays,  the  conservation  of  the  events  was  further  assessed  by 

l^!il3S-5Z_MLd^liOIMlL.t?w/i^!Et!j^2M?^i!DSLAiQsfgrx/|«^subsi>ecies_isoIates.  The  PCR  results  showed  that 
the  arrangements  of  the  CRi„,iari,.  n  are  highly  conserved,  particularly  in  the  liolailica  subspecies- 
consistent  with  the  hypothesis  that  holartica  populations  have  recently  experienced  a  periodic 
selection  event  or  they  have  emerged  from  a  recent  clonal  expansion.  Two  unique  ntlarensis- 
like  strains  were  also  observed  to  share  some  CRi„,i-,r,i.---,  with  the  holartica  subspecies  and  others 
with  the  tularensis  subspecies,  implying  that  these  strains  may  represent  a  new  taxonomic  unit. 
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introduction 

Francisella  tularensis  is  a  non-motile,  Gram-negative  coccobacillus  originally  isolated 
from  ground  squirrels  in  1911  during  a  plague  investigation  in  Tulare  County,  CA  [1],  The 
geographic  distribution  of  the  organism  spans  the  entire  Northern  Hemisphere,  with  only  a  very 
recent  isolated  recovery  of  the  organism  occurring  in  the  Southern  Hemisphere  [2,  3].  The 
organism  is  a  facultative  intracellular  pathogen  and  is  believed  to  affect  more  animal  species  than 
any  other  known  zoonotic  pathogen  [4,  5],  It  has  been  isolated  from  as  many  as  250  species  of 
wildlife  (reviewed  by  Oysten,  Sjostedt,  et.  al)  [6]  including  various  birds,  amphibians,  fish  and 
many  mammalian  species.  The  organism  can  also  be  found  in  invertebrates  species,  including 
arthropod  vectors  such  as  mosquitoes  and  ticks  (reviewed  by  Petersen  and  Schriefer)  [7]. 

Human  infection  occurs  most  often  through  direct  exposure  to  infected  animals  or  by  bites  from 
infected  arthropod  vectors.  Recently,  terrestrial  and  aquatic  life  cycles  have  been  described  for 
F.  tularensis  [8,  9];  and  protozoa,  such  as  Acanthamoeba  castellanii ,  may  also  serve  as  a  host  for 
maintenance  of  F.  tularensis  in  the  aquatic  cycle  [10]. 

The  species  F.  tularensis  is  comprised  of  four  recognized  subspecies:  Subsps.  tularensis 
(Type  A),  holarctica  (Type  B),  novicida,  and  mediaasiatica,  the  two  former  of  which  are 
considered  clinically  significant  in  humans  [11,12]  and  by  far  have  been  the  most  studied.  F. 
tularensis  subsp.  tularensis  is  believed  to  be  more  virulent  in  humans  than  F.  tularensis  subsp. 
holarctica  based  on  epidemiological  data  and  its  higher  infectivity  in  animals.  F.  tularensis 
subsp.  tularensis  and  F.  tularensis  subsp.  holarctica  also  show  striking  geographic  differences  in 
their  distribution,  with  both  the  tularensis  and  holarctica  subsp.  being  found  in  North  America 
but  only  the  holarctica  subsp.  being  found  in  Europe  and  Asia  [7],  Populations  of  subsp. 
mediaasiatica  may  be  even  more  geographically  limited  since,  as  its  name  suggests,  this 
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subspecies  has  only  been  isolated  from  the  Asian  subcontinent.  The  novicida  subspecies  has 
been  found  primarily  in  the  U.S.  but  was  recently  detected  in  Australia  [11,  13]. 

Despite  the  unique  geographic  and  virulence  characteristics,  known  genetic  and 
phenotypic  differences  distinguishing  the  tularensis  and  holarctica  subpopulations  seem  to  be 
more  limited.  Biochemically,  the  two  subspecies  have  classically  been  differentiated  primarily 
on  the  basis  of  glycerol  fermentation,  production  of  citrulline  ureidase,  and  erythromycin 
resistance  [11].  High  resolution  genotyping  methods  such  as  pulsed  field  gel  electrophoresis 
(PFGE)  [14],  restriction-fragment  length  polymorphism  (RFLP)  [15],  Amplified  Fragment 
Length  Polymorphisms  (AFLP)  [14],  and  Multi-Locus  Variable  Number  Tandem  Repeat 
analysis  (MLVA)  [4, 16, 17],  also  distinguish  the  subspecies  genotypically  and  show  that  they 
are  divergent,  but  clonal  ly  related. 

Given  the  unique  geographical  and  virulence  characteristics,  there  is  tremendous  interest 
in  understanding  the  genetic  basis  for  these  characteristics.  Recent  comparative  genome 
hybridization  studies  identified  limited  differences  in  genome  content  between  the  two 
subspecies,  but  did  include  deletion  in  the  pdpD  region  which  is  associated  with  virulence  [18, 
19],  Comparative  genome  sequencing  efforts  are  also  underway  and  promise  to  provide  detailed 
information  with  regard  to  specific  strains.  To  provide  a  more  complete  catalogue  of  the 
genomic  events  which  arose  early  during  divergence  of  the  subspecies  (true  supsbepcies-specific 
genomic  differences  as  opposed  to  strain-level  differences),  we  have  applied  Paired  End 
Sequence  Mapping  (PESM)  to  identify  candidate  regions  of  genomic  difference  and  further  used 
Comparative  Genome  PCR  (CG-PCR)  on  a  large  set  of  strains  to  identify  regions  of  genomic 
difference  that  are  conserved  across  multiple  isolates.  PESM  was  originally  developed  as  a 
method  to  identify  genomic  islands  of  Shigella  dysenteriae  [20].  The  PESM  strategy  measures 
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the  physical  distance  between  paired-end  reads  from  a  clone  library,  specifically  searching  for 
clones  whose  physical  distance  is  incongruent  with  the  predicted  distance  based  on  available 
genome  sequences.  In  this  application,  we  constructed  a  library  from  an  F.  tularensis  subsp. 
holarctica  strain  and  compared  physical  distances  with  the  F.  tularensis  SHU  S4  genome 
sequence.  Cloned  segments  with  incongruent  lengths  compared  to  the  map  position  were  further 
distinguished  as  strain-specific  versus  potentially  subspecies-specific  by  comparison  to  the  F. 
tularensis  subsp.  holarctica  strain  LVS  genome  sequence.  In  instances  where  the  length 
difference  was  conserved  in  the  reference  strain  and  the  LVS  holarctica  strain,  the  segments 
were  further  tested  among  a  panel  of  holarctica  and  tularensis  strains  to  confirm  that  the  genome 
difference  was  broadly  conserved  across  the  subspecies.  Using  this  strategy,  we  identified 
seventeen  regions  in  the  genome  that  are  continuous  in  66  of  67  subspecies  holarctica  strains 
examined,  but  which  are  discontinuous  in  tularensis  strains.  These  regions,  termed  CRhoiarctica, 
have  arisen  through  massive  insertion/deletion,  translocation,  and  rearrangement  events  and  their 
conservation  among  holarctica  strains  of  distinct  temporal  and  geographic  origin  implies  that 
this  subspecies  has  likely  been  through  a  recent  periodic  selection  event. 
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Materials  &  Methods 


Bacterial  strains  and  growth  conditions.  Cultures  for  this  study  were  propagated  on  chocolate  agar  at 
37°C  in  5%  C02.  Glycerol  fermentation  of  the  limited  number  of  isolates  was  performed  either  as 
described  [19],  or  by  using  Biolog®  (Biolog,  Inc.,  Hayward,  CA)  according  to  manufacturer’s 
instructions.  A  summary  of  spatial,  temporal,  host,  and  other  pertinent  demographic  information,  as  well 
as  prior  subspecies  determinations  of  all  strains  and/or  DNA  used  in  this  study,  is  listed  in  supplemental 
Table  1  (Table  si). 

Subspecies-specific  PCR.  To  confirm  the  subspecies  designation  of  all  the  strains  in  our  collection,  we 
tested  them  using  an  RP1  PCR  assay  previously  shown  to  differentiate  among  all  four  F.  (utarensis 
subspecies  H  81.  All  RD 1  results  are  included  in  Table  si .  Printers  for  RP1  were  those  same  as 
published  hv  Brockhuiisen  cl  al  [  1 S],  All  RD1  PCR  reactions  were  performed  in  25  til  volumes,  each 
containing  5  inM  MeCb  and  160  iiM  of  each  dNTP  (Idaho  Technology,  Salt  Lake.  UT).  500  nM  each  of 
forward  and  reverse  printer  (invitrogen.  Carlsbad.  CA).  and  2.5  Units  of  Platinum  Tag  flnvitrogenl.  Each 
reaction  was  conducted  on  1 .5  ul  DNA  samples  either  prepared  by  using  a  standard  large-scale  bacterial 
genomic  DNA  extraction  protocol  1231  or  using  PIJREGENE'  DNA  isolation  kits  (Centra  Systems.  Inc.. 
Minneapolis.  MN).  Thcrmocvcling  conditions  were  optimized  and  performed  on  a  Dyad  (MJ  Research. 
Reno.  NV)  thcrmocvclcr  according  to  the  following  cycling  parameters:  Initial  hold  at  95°C  for  2  iriin.  30 
sec:  30  cycles  of  95°C  for  30  sec.  64°C  for  I  min,  and  72°C  for  1  min;  final  extension  at  72°C;  and  a  final 
indefinite  hold  at  4°C.. 


Construction  of),  phage  library.  Francisella  tularensis  subsp.  holarctica  strain  MS304  is  a  human 
isolate  obtained  in  2002  from  the  State  of  Missouri.  A  library  was  constructed  from  MS304  genomic 
DNA  by  partial  digestion  with  Sau3  Al .  After  optimization  of  the  partial  digestion  for  1 0-1 5  Kb 
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fragments,  7  ug  of  genomic  DNA  was  digested  with  &ju3A1  at  in  separate  reactions  with  0.0625,  0.0312, 
and  0.0156  units  per  microgram  for  1  hour  at  37°C.  The  partially  cut  DNA  was  electrophoresed  on  a 
0.7%  agarose  gel  along  with  molecular  weight  markers  and  the  regions  containing  10-15  kb  fragments 
were  excised  from  the  gel.  The  fragments  were  electroluted,  pooled,  and  then  precipitated  to  concentrate. 

Size  distribution  of  the  gel-purified  fragments  was  confirmed  by  agarose  gel  electrophoresis  of  a  small 

portion  of  the  fragments  alongside  molecular  weight  standards.  The  remaining  purified  fragments  were  _  _ 

.  {  Formatted:  Font:  Italic 

then  ligated  into  Lambda  DASH  II  JSam  HI  (Stratagene,  La  Jolla,  CA).  The  ligations  were  packaged 
using  Stratagene’s  Gigapack  III  Gold  Extract  according  to  the  manufacturer’s  recommendations.  The 
packaged  phage  were  titered  on  XLl-Blue  MRA  P2  host  bacteria  (Stratagene).  The  titer  from  the 
packaging  was  approximately  5  X  106  PFU/ml.  Library  diversity  was  confirmed  by  restriction  digestion 
and  DNA  sequence  analysis  of  inserts  from  1 0  independent  plaques.  The  library  was  then  amplified 
once  using  XLl-Blue  MRA  P2  host  bacteria  and  DMSO  was  added  to  7%  final  concentration  in  the 
clarified  supernatant.  The  amplified  library  had  a  titer  of  2.5  x  107  pfu/ml.  One  ml  aliquots  were  stored 
at  -80°C. 

Direct-PCR  Amplification  (DPA)  of  Cloned  Fragments.  For  high  throughput  PCR  amplification  of 
individual  plaques,  the  library  was  diluted  and  plated  onto  mid-logarithmic  phase  P-2  cells  (OD«io  ~0.6) 
grown  in  NZYM  Broth  with  0.2%  maltose.  Dilutions  of  the  library  stock  were  made  in  Lambda 
suspension  medium  (SM)  buffer  and  dilutions  yielding  approximately  120-150  plaques  perplate  were 
subsequently  plated  onto  several  80  cm  Petri  dishes  using  P2  host  cells.  After  solidifying,  the  plates  were 
inverted  in  the  incubator  overnight  at  37°C. 

To  amplify  plaques  for  length  measurement  and  paired-end  sequencing,  plaques  were  chosen 
from  dilutions  giving  120-150  well  isolated  plaques.  Plates  with  the  appropriate  number  of  plaques  were 
first  wrapped  with  parafilm  and  inverted  at  4°C  for  a  minimum  of  4  hours  and  a  maximum  of  4  days. 

Clearly  isolated  plaques  were  further  processed  by  direct-PCR  amplification  (DPA).  DPA  was  performed 
in  96-well  PCR  plates  (Applied  Biosystems,  Foster  City,  CA)  pre-loaded  with  PCR  master  mix.  Plaques 
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for  DPA  were  loaded  by  gouging  each  candidate  plaque  with  a  sterile  20  ul  pipette  tip  (such  that  the  tip 
contained  visible  plaque  material)  and  then  mixing  the  tip  in  a  single  well  of  the  PCR  plate.  Long-range 
PCR  was  then  performed  using  a  TaKaRa  Ex  Taq™  Hot  Start  master  mix  kit  (TaKaRa  Mirus  Bio, 

Madison,  WI)  containing  T3  (5'-AATTAACCCTCACTAAAGGG-3')  and  T7  (5’- 
TAATACGACTCACTATAGGG-3’)  primers,  which  prime  amplification  from  the  T3  and  T7  promoter 
regions  present  in  the  arms  of  the  lambda  cloning  vector.  PCR  reaction  mixes  were  prepared  according  to 
the  manufacturer’s  recipe  with  each  single  25  ul  reaction  containing  500  nM  of  each  primer. 

Thermocycling  conditions  were  optimized  and  performed  on  a  Dyad  (MJ  Research,  Reno,  NV) 
thermocycler  according  to  the  following  cycling  parameters:  Initial  hold  at  95°C  for  2  mins,  30  sec;  36 
cycles  of  95°C  for  50  sec,  55°C  for  50  sec,  and  72°C  for  15  mins;  final  extension  at  72°C  for  5  mins;  and  a 
final  indefinite  hold  at  4°C. 

Following  each  PCR  run,  clone-amplicon  purification  was  accomplished  using  Montage®  PCRM96 
Plates  (Millipore,  Billerica,  MA)  which  were  processed  on  a  SAVM  384  Vacuum  Manifold  (Millipore) 
according  to  the  manufacturer’s  instructions.  The  final  elution  was  performed  using  30  ul  of  Invitrogen 
Distilled  DNase-/RNase-free  water  and  plates  were  sealed  with  adhesive  sealing  lids  (Bio-Rad,  Hercules, 

CA)  and  stored  at  4°C. 

Clone- Amplicon  Sizing  Experiments.  Amplicon-size  determinations  were  performed  by  agarose  gel 
electrophoresis.  1 5  cm  x  25  cm  0.65%  agarose  gels  containing  5 1  -wells  were  made  in  1  x  Tris-acetate- 
EDTA  (TAE).  Each  gel  accommodated  47  clone-amplification  reactions  along  with  4  separate  lanes  of  1- 
15  kb  Molecular  Ruler  (Bio-Rad)  for  length  standardization.  Each  gel  was  electrophoresed  at  85  V  for 

{Deleted:  G 

approximately  4.5  hours.  Ethidium  bromide-stained  gels  were  imaged  using  a  Syngene  GeneGenius . 

(Synoptics,  Frederick,  MD)  imaging  system,  and  the  image  was  analyzed  using  the  GeneTools 
(Synoptics)  software  package.  The  band  sizes  for  all  clones  and  DNA  standards  were  exported  from 
GeneTools  into  a  Microsoft  Excel  spreadsheet  and  loaded  into  the  PESM  pipeline. 
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Paired-End  Sequencing.  DNA  sequencing  reactions  for  all  clones  were  carried  out  using  BigDye* 

Terminator  v3.1  Cycle  Sequencing  Kits  (Applied  Biosystems)  with  pGEM  DNA  serving  as  the  sequence 
reaction  controls.  Each  sequence  reaction  experiment  was  setup  in  96-well  reaction  plates,  with  each 
sample  being  divided  into  two  reactions  -  one  reaction  with  T3  primer  and  the  other  with  T7  primer  (both 
primers  the  same  as  from  the  initial  PCR  reactions).  Sequence  reactions  were  carried  out  on  the  Dyad 
(MJ  Research)  thermocycler,  and  the  sequence  reaction  plate  was  cleaned-up  using  a  Montage  SEQ96  Kit 
on  the  SAVM  384  Vacuum  Manifold  (Millipore).  The  labeled  DNA  was  then  transferred  into  new 
reaction  plates  and  loaded  onto  an  ABI  3100  automated  capillary-electrophoresis  (CE)  sequencer 
(Applied  Biosystems),  which  was  configured  with  an  80  cm  capillary  array  and  loaded  with  Performance 
Optimized  Polymer  (POP)-4  (Applied  Biosystems).  Control  sequencing  reactions  were  performed  on  the 
two  pGEM  vector  reactions  in  each  set  of  94  sample  reactions,  and  the  pGEM  control  sequences  were 
evaluated  to  ensure  the  sequence  reaction  and  sequence  run  were  successful.  The  .abi  sequence  files  were 
trimmed  in  Sequencher  V.  4.0.5  (Gene  Codes,  Ann  Arbor,  Ml),  merged,  and  output  as  a  single  FASTA 
file  for  further  analysis. 

Fragment  Length  and  Paired-End  Sequence  Pipeline.  The  sizing  and  sequence  files  were  input  into  a 
Perl-based  program,  referred  to  as  the  Paired-End  Sequence  Mapping  Pipeline  (PESMP),  to  identify 
coordinates  of  the  paired  end  reads  from  the  draft  F.  tularensis  live  vaccine  strain  (LVS)  whole  genome 

..(Deleted:  (WGS) 

scquence/ftp://bbm.  I  Ini.  aov/pub/cbnn/F-tularensis/F.  tularensis. htnill  [21]  and  the  completed  SCHU  S4 
genome  sequence  [22]  and  then  to  compare  the  predicted  lengths  to  the  physical  length  of  the  cloned 
segment.  The  PESMP  input  files  consisted  of  a  composite  FASTA  file  from  the  trimmed  sequences 
resulting  from  a  single  96-well  plate  along  with  a  corresponding  Microsoft  Excel  file  containing  the 
Long-PCR  amplicon  sizing  data  for  each  clone.  The  PESMP  algorithm  then  outputs  the  coordinates  and 
predicted  length  of  the  fragment  corresponding  to  the  paired  end  reads  from  the  two  genome  sequences 
along  with  the  physical  measurement  of  the  long  PCR  amplicon. 
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To  identify  clones  with  discrepancies  between  physical  length  between  paired-end  reads  and  _ 
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predicted  length,  the  output  from  the  PESM  pipeline  wasjoaded  into  an  Excel  spreadsheet  and  sorted  ,  _ 
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according  to  the  paired-end  coordinates  from  thcJLVS  genome  sequence.  Clones  in  which  the  physical 


size  showed  a  >  2  Kb  discrepancy  from  the  predicted  size  of  the  SCHU  S4  genome  but  not  the  LVS 
genome  were  chosen  for  further  characterization.  ^Sequence  of  these  putative  regions  of  genomic 
difference  was  obtained  by  electronic  PCR.  using  the  coordinates  of  the  paired-end  reads  to  extract  the 
jntervenina  sequence  between  the  reads  from  the  LVS  genome  sequence.  These  jelectroniqclone 
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sequences  were  then  aligned  by  contig  analysis  using  Sequencher  to  further  delimit  the  physical 
boundaries  of  the  CRhoiarctica- 
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CRhoiarctica,  each  CRhoiarctica  was  used  for  BLAST  analysis  against  the  SCHU  S4  genome.  Query 


coordinates  for  all  segments/subsegments  (shown  in  Table  s2),  except  for  multiple  repeats  of  IS  elements 
were  then  used  to  map  the  location  of  the  events  in  the  SCHU  S4  genome  corresponding  to  the  CRh0iarctica- 
CRhoiarctica  maps  were  assembled  using  the  SeqBuilder  module  of  Lazergene  V.6.0  (DNASTAR,  Madison, 
WI)  to  identify  corresponding  gene/pseudogene  content  of  the  CRhoiarctica  from  the  SCHU  S4  annotation. 
jAll  genes  from  SCHU  S4  annotation  found  to  be  truncated  at  the  flanks  due  to  the  beginning  or  ending  of 
the  CRhoiarctica  clones,  as  well  as  those  found  internally  due  to  rearrangements  within  the  CRhoiarctica,  were 
used  in  BLAST  analvses^against  the  LVS  sequence  for  homology  comparisons.  TIGR  in-house  Perl 
scripts  were  run  on  a  Linux  platform  to  generate  graphical  representations  of  all  combined  CRhoiarctica 
mapped  on  SCHU  S4  shown  in  Fig.  3,  and  each  of  6  individual  CRhoiarctica  mapped  on  SCHU  S4  shown  in 
Fig.  5  and  supplemental  sFig.  1  through  sFig.  5.  The  final  figures  as  shown  were  assembled  in  Adobe 
Illustrator  10. 
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<  results  are  included  in  Table  si.  Primers 
!  for  RD1  were  those  same  as  published  by 
■  Broekhuijsen  eta/.  [18].  All  RD1  PCR 
reactions  were  performed  in  25  pi 
volumes,  each  containing  5  mM  MgCb 
and  160  pM  of  each  dNTP  (Idaho 
Technology,  Salt  Lake,  UT),  500  nM 
each  of  forward  and  reverse  primer 
(Invitrogen,  Carlsbad,  CA),  and  2.5  Units 
of  Platinum  Taq  (Invitrogen).  Each 
reaction  was  conducted  on  1.5  pi  DNA 
samples  either  prepared  by  using  a 
standard  large-scale  bacterial  genomic 
DNA  extraction  protocol  [23]  or  using 
PUREGENE®  DNA  isolation  kits  (Gentra 
Systems,  Inc.,  Minneapolis,  MN). 
Thermocycling  conditions  were 
optimized  and  performed  on  a  Dyad  (MJ 
Research,  Reno,  NV)  thermocycler 
according  to  the  following  cycling 
parameters:  Initial  hold  at  95°C  for  2 
min,  30  sec;  30  cycles  of  95°C  for  30  sec, 
64°C  for  1  min,  and  72°C  for  1  min;  final 
extension  at  72°C;  and  a  final  indefinite 
hold  at  4°C.  H 
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Comparative  Genome  PCR  (CG-PCR).  Confirmation  of  the  CRhoiarc,iCa  was  conducted  by  PCR  analysis 
on  multiple  isolates  of  the  Jiolarctica  andjularensis  subspecies.  For  each  CRhpiarctica,  a  three-primer 
nested-PCR  assay  was  designed  based  on  the  relative  coordinates  of  the  corresponding  junctions  of  the 
CRhoiarctica  maps.  Primers  for  all  assays  were  designed  using  Primer3  software 
(httn://frodo. wi.mit.edu/cgi-bin/primer3/nrimer3  www.cgil.  The  assays  were  designed  by  first 
identifying  a  primer  common  to  both  LVS  and  SCHU  S4,  either  forward  or  reverse  (designated  C-F  or  C- 
R),  immediately  adjacent  to  a  breakpoint  in  the  SCHU  S4  genome  sequence  where  synteny  of  the 
adjacent  segment  changed  or  was  translocated  leaving  a  SCHU  S4-specific  region  for  a  SCHU  54- 
specific  (S)  primer,  and  which  likewise  left  a  target  for  an  LVS-specific  (L)  primer  in  the  adjacent- 
contiguous  LVS  sequence.  The  design  of  all  assays  was  such  to  produce  “A”-type  bands  across  all  17  CR 
for  SCHU  S4,  and  “B”-type  bands  across  all  17  CR  for  LVS.  Since  either  intact  or  truncated  IS  elements 
or  their  corresponding  repeated  elements  were  present  near pll  breakpoints,  care  was  taken  to  avoid 
placement  of  primers  within  IS  element.^  (see  Table  s3  for  all  primer  coordinates,  sequences,  and  expected 
pmplicon  sizes).  All  PCR  assays  were  conventional  by  design  and  conducted  on  1 .5  pi  of  the  DNA 
samples  used  for  RD1  PCR.  All  CG-PCR  reactions  were  performed  in  25  pi  volumes,  each  containing  5 
mM  MgCl2  and  160  pM  of  each  dNTP  (Idaho  Technology,  Salt  Lake,  UT),  500  nM  each  of  common 
primer,  LVS-specific  primer,  and  SCHU  S4-specific  primer  (Invitrogen,  Carlsbad,  CA),  and  2.5  Units  of 
Platinum  Taq  (Invitrogen).  Thermocycling  conditions  were  optimized  and  performed  on  a  Dyad  (MJ 
Research,  Reno,  NV)  thermocycler  according  to  the  following  cycling  parameters:  Initial  hold  at  95°C 
for  2  min,  30  sec;  32  cycles  of  95°C  for  30  sec,  60°C  for  1  min,  and  72°C  for  1  min;  final  extension  at 
72°C;  and  a  final  indefinite  hold  at  4°C.  Each  assay  was  first  tested  against  SCHU  S4  and  LVS,  and  then 
against  a  panel  of  91  additional  global  Francisella  strains  representing  both  subspecies  as  well  as  subsp. 
novicida  and  a  unique  holarctica  strain  from  Japan  [13,  15, 24],  tentatively  called  subsp.  japonica  [4], 
but  for  our  studies  referred  to  as  subsp.  holarctica- japan,  or  “holarc-jap”  (see  Table  si  for  panel 
composition).  The  amplicons  were  visualized  on  0.8%  - 1%  agarose  gels  run  in  IX  TAE  at  85  V  for 
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approximately  2  hours,  or  until  adequate  size-discrimination  was  accomplished.  A  100-bp  PCR 
Molecular  Ruler  ranging  from  100  bp  to  3  kb  (Bio-Rad)  was  used  for  size  determinations. 


^Results  _  _  _ _ _ _ _ _ 

Paired-End  Sequencing.  A  total  of  752  plaques  were  picked  and  subjected  to  DP  A  with  551  of  the  DP  A 
yielding  amplicons  >8  Kb  in  length  that  were  of  sufficient  quality  and  quantity  for  size  determination  and 
DNA  sequence  analysis  (DP A  success  rate  of  73.3%).  The  mean  amplicon/insert  size  from  the  551 
successful  DPA  reactions  was  14,174  bp,  which  corresponds  to  approximately  7.8  Mb  of  coverage,  or  an 
estimated  4.1  X  coverage  of  the  1.89  Mb  F.  tularensis  subsp.  tularensis  genome  [22], 
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Mapping  of  Paired-End  Sequence  Reads.  Of  the  55 1  clones  with  quality  paired-end  reads,  66  clones 
had  physical  lengths  that  were  not  congruent  with  distance  between  the  paired-end  reads  relative  to  the 
SCHU  S4  genome,  but  were  congruent  with  distances  predicted  from  the  LVS  genome.  These  clones 
were  further  considered  as  candidates  for  subspecies-specific  genomic  events.  Alignment  of  the 
sequences  from  the  paired-end  reads  of  candidate  clones  grouped  the  66  cloned  segments  into  17  different 
contiguous  regions  (CRhoiarctica)  which  align  with  the  F.  tularensis  subsp  holarctica  strain  LVS  genome  but 
are  non-contiguous  or  otherwise  altered  in  the  F.  tularensis  subsp.  tularensis  SCHU  S4  genome  sequence. 
Plotting  of  the  number  of  CRhoiarctica  identified  versus  the  total  number  of  clones  sequenced  (Fig.  1 ), 
showed  that  the  number  of  new  CRhoiarctica  began  to  decrease  sharply  after  250  clones  were  sequenced.  Of 
the  last  301  clones  sequenced,  only  three  new  CRhoiarctica  were  identified,  suggesting  that  the  library  was 
nearly  saturated. 


Comparative  Genome  PCR  (CG-PCR)  confirmation  of  CRh0iarc,i„.  Conservation  of  the  CRhoiarctica  in 
the  MS304  reference  strain — which  was  isolated  in  2002,  and  is  temporally  and  geographically  distinct 
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from  the  LVS  strain  isolated  in  1941 — leads  to  the  simple  hypothesis  that  these  CR  likely  arose  early 
during  divergence  of  the  tularensis  and  holarctica  subspecies  and  therefore  should  be  conserved  across 
most  holarctica  strains.  To  confirm  this,  CG-PCR  assays  were  developed  for  each  CRho]arctica  using  nested 
primer  sets  at  the  junctions  of  the  CR  (Table  s3).  The  different  CG-PCR  reactions  for  all  1 7  CRhoiarctica 
were  then  run  on  a  panel  of  DNA  samples  from  SCHU  S4  and  LVS,  and  also  from  19  different  subsp. 
tularensis  strains,  67  subsp.  holarctica  strains,  3  subsp.  novicida  strains,  and  a  single  strain  each  of  subsp. 
holarctica-japan  and  F.  philomiragia.  The  results  for  all  17  CG-PCR  panels  are  shown  in  Fig.  2.  The 
colors  correspond  to  different  size  amplicons  produced  from  each  CG-PCR.  Overall,  the  subsp. 
holarctica  strains  produced  homogenous  results  across  all  17  CR,  with  66  of  the  strains  (98.5%) 
producing  the  expected  amplicon  based  on  the  LVS  genome  sequence.  The  only  deviation  occurred  in  F. 
tularensis  subsp.  holarctica  strain  Tu-42,  which  produced  a  subsp.  tularensis  A-type  band.  Thus, 
excluding  this  one  exception,  the  CRhoiarctica  identified  through  the  PESM  pipeline  are  indeed  highly 
conserved. 

Unlike  the  67  subsp.  holarctica  strains,  the  19  subsp.  tularensis  strains  displayed  significantly 
more  heterogeneity  in  the  CG-PCR  assays.  At  least  four  different  subgroups  of  the  tularensis  subspecies 
can  be  resolved.  All  share  the  RD1  region  in  common,  implying  that  tliev  are  taxonoinicallv  true  subsn. 
ftilcfrensis  deri vati  ves.  jSCHU  S4  and  nine  other  subsp.  tularensis  strains  comprise  one  subgroup  and  have 
identical  genome  organization,  producing  A-type  bands  (red  squares),  across  all  17  CR,  A  second 
subgroup  is  represented  by  A88R160,  88R52,  88R144,  AK-1 133496,  AK-1 100558,  AK-1 100559,  with 
each  of  these  strains  sharing  an  amplicon  from  the  CR10  nested  PCR  reaction  that  was  unique  in  size 
(denoted  by  orange  squares).  All  six  of  these  strains  were  isolated  from  rabbits  or  hares,  with  the  three 
AK  strains  derived  from  Alaska  in  2003  and  2004  and  the  three  others  isolated  from  the  contiguous 
United  States.  The  ATCC  6223  strain  was  likewise  shown  to  be  identical  the  previous  six  strains,  but  it 
was  originally  isolated  from  a  human  patient  in  1920  and  has  since  lost  its  virulence  [4],  A  third 
subgroup  is  represented  by  the  single  strain  OK-98041035  which  matches  the  SCHU  S4  subgroup  except 
that  it  failed  to  produce  a  CRM  PCR  amplicon  (denoted  by  a  yellow  square).  The  fourth  subgroup 


comprises  a  very  unique  set  of  two  isolates,  strains  WY-00W41 14  and  WY-WSVL02.  These  strains  both 
produced  an  A-type  band  at  CR3,  CR4,  CR5,  CR6,  CR8,  CR1 1 ,  CRI 5,  and  CR1 7  (red  squares),  were 
negative  at  CRI  and  CR13  (yellow  squares),  and  produced  B-type  bands  at  CR7,  CR9,  CR12,  and  CR14 
(green  squares).  Unlike  any  other  strains,  they  also  produced  unique  bands  at  CRI 6  (blue-grey)  and 
CR10  (blue-grey,  different  in  size  from  orange).  These  two  strains  were  also  distinguishable  from  one 
another  in  that  WY-00W41 14  produced  a  B-type  band  at  CR2  (green)  whereas  WY-WSVL02  was 
negative  (yellow).  These  two  strains  were  also  only  slightly  capable  of  fermenting  glycerol  [19], 
Collectively,  the  genetic  and  biochemical  data  strongly  suggest  these  two  strains  represent  a  new 
taxanomic  unit.  If  indeed  this  is  a  new  taxon,  then  the  population  is  likely  to  be  virulent  since  one  of  the 
isolates  was  obtained  from  a  human  clinical  sample  [19]. 

As  would  be  expected,  the  F.  tularensis,  subsp.  novicida,  subsp.  holarctica-japan,  and  F. 
philomiragia  strains  showed  heterogenous  CG-PCR  results.  F.  philomiragia  was  negative  (yellow) 
across  each  of  the  1 7  CR.  The  three  subsp.  novicida  strains  were  negative  (yellow)  for  CRI ,  CR4,  CRI  4, 
and  CRI  6;  and  they  produced  a  unique  size  amplicon  from  (blue-grey)  CR3,  CR5,  CR7,  CR8,  CR9, 
CR10,  CR12,  CR13,  and  CR17.  All  three  strains  produced  a  “B”-  type  allele  (green)  across  CR6  and 
CRI 5  and  they  all  produced  an  “A”-type  allele  (red)  across  CRI  1 .  CR2  differentiated  between  the  Tu-43 
strain,  which  produced  a  unique  amplicon  while  the  other  two  novicida  strains  (from  ATCC  and 
USAMRIID)  which  were  both  negative.  Consistent  with  its  classification  as  a  separate  subspecies  [4],  the 
single  subsp.  holarctica-japan  strain  was  also  distinct  from  all  other  strains  in  this  study;  it  produced  a 
“B”-type  allele  (green)  across  CRI,  CR2,  CR3,  CR5,  CR6,  CR7,  CR9,  CR10,  CRI  1,  CR12,  CR15,  CR16, 
a  unique  allele  (orange)  across  CRI  3,  an  “A”-type  allele  (red)  across  CRI  7,  and  was  negative  (yellow) 
across  CR4,  CR8,  and  CRI  4. 

Fine  Structure  Analysis  of  CRhourctk.-  Fine-structure  mapping  and  annotation  of  the  CRh0iarciica  was  next 
conducted  by  alignment  of  the  CRhoiarcticaContigs  from  strain  LVS  genome  sequence  with  the  SCHU  S4 
genome  sequence.  The  corresponding  locations  of  the  aligned  regions  are  shown  in  Fig.  3.  The 
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combined  DNA  represented  by  the  17  CRs  corresponds  to  nearly  230  genes/pseudogenes  and  over  30 IS- 
elements,  mainly  a  combination  of  ISftul  and  ISftu2  elements.  Most  of  the  of  the  rearrangements  and 
translocations  are  juxtaposed  to  IS  elements,  suggesting  that  many  of  the  events  were  likely  mediated  by 
these  elements,  resulting  in  remarkably  large  changes  in  the  location  of  specific  genome  segments 
between  SCHU  S4  and  LVS.  but  with  little  effect  on  the  corresponding  content  of  the 

transposed/lranslocated  regions.  Indeed  nearly  all  of  the  genes  wi t h in  jh e .f  Rk,i,-, arc  present  in  both . 

the  subsp.  holarctica  LVS  and  subsp.  tularensis  SCHU  S4  genome.  ,  . . 

The  distribution  of  the  CR1-CR17  segments  around  the  LVS  genome  and  the  relative  positions  of 
the  corresponding  regions  in  the  SCHU  S4  genome  have  some  remarkable  characteristics.  First,  the  CR 
in  the  LVS  genome  show  some  positional  bias,  with  thirteen  of  the  seventeen  CRhoiarctica  being  present  in 
roughly  one-half  of  the  genome  (the  region  between  8  o’clock  and  2  o’clock  extending  from  1 .3  Mb  to 
0.3  Mb).  Secondly,  there  are  three  notable  instances  where  segments  from  different  CRh0iarciica  in  LVS  are 
positioned  adjacent  to  one  another  in  the  SCHU  S4  genome.  Specifically,  segments  from  CR  1  and  CR)  6 
are  adjacent  in  the  SCHU  S4  genome  as  are  segments  from  CR13  and  CR15,  and  CR4  and  CR10,  and  all 
three  events  are  illustrated  in  sFig.  4.  The  juxtapositioning  of  these  segments  in  subsp.  tularensis 
suggests  that  their  organization  in  thepilarensis  subspecies  was  the  ancestral  state  while  their 
organization  in  holarctica  is  a  derived  state.  As  shown  in  sFig.  4c,  this  notion  is  further  supported  by  the 
finding  that  both  CR13  and  CR15  contain  genes  that  are  involved  in  glycerol  fermentation  and  it  seems 
likely  that  their  ancestral  condition  would  have  been  functionally  clustered. 
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Genes  affected  by  rearrangements,  further  bioinformatics  analysis  of  the  junction  of  the  juxtaposed 
CR4-CR10  segment  revealed  the  complete  deletion  of  a  gene  of  unknown  function  (FTT  1308c)  from 
subsn.  jiolartica.  pl  additioryiine  genes  were  found  which  are  disrupted  as  a  consequence  of  the 
rearrangements  or  translocation  of  genome  segments  in  the  CR.  The  intact  versions  of  these  genes  in  the 
SCHU  S4  genome  encode  proteins  with  significant  similarity  to  oligopeptide  transporters  ( oppD  and 
oppF),  a  ribosome  modification  gene  (rimfC),  an  acetyltransferase  (FTTO 1 77c),  and  genes  of  unknown 
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function  (corresponding  to  FTT0898c,  FTT1 122,  FTT0921,  and  FTT131 1).  ASFig.  4c  illustrates  the  region 
of  CR13  in  SCHU  S4  containing  the  intact  oppD  and  oppF  genes  which  are  truncated  in  the  LVS 
genome.  The  aceF  gene,  encoding  the  E2  of  pyruvate  dehydrogenase  lies  near  the  junction  of  CR3 
(along  with  aceE  and  Ipd)  and  carries  a  300  base  in-frame  deletion.  A  finc-structut; genome  comparison 

of  the  entire  CRh0iarcfa3  mapped  onto  SCHU  S4  is  shown  in  (Fig. _ or  sFig. _ ).  The  deletion 

corresponds  to  loss  of  a  repeated  biotin-binding  repeat  region,  leaving  holarctica  strains  with  two  biotin¬ 
binding  domains  while  the  tularensis  strains  contain  three.  Whether  the  deletion  occurred  during  the 
translocation  event  and  whether  it  affects  function  of  the  pyruvate  dehydrogenase  complex  is  not  clear. 
The  AceF  orthologues  from  several  pathogenic  species,  including  Vibrio,  Yersinia,  Shigella  and  E.  coli, 
do  carry  three  domains.  It  has,  however,  been  shown  that  deletion  of  two  of  the  three  domains  of  AceF  in 
E.  coli  has  little  affect  on  function  [25],  Whether  the  additional  binding  domain  influences  efficiency  of 
the  reaction  and  whether  it  could  contribute  to  virulence  will  require  further  experimentation.  It  is  worth 
noting  that  the  aceF  truncation  was  also  identified  by  comparative  genome  hybridization  studies  [12,  18, 
19],  but  the  translocation  event  corresponding  to  CR3  was  not  detected,  underscoring  the  importance  of 
using  multiple  approaches  for  comparative  genome  analyses. 

In  addition  to  direct  disruption  as  a  consequence  of  translocation,  genes  near  the  junctions  of  the 
translocation  events  could  also  be  subject  to  control  by  unique  regulatory  machinery.  In  this  light,  it  is 
interesting  to  note  that  some  of  the  genes  within  the  CR  and  near  the  junctions  could  have  functions 
related  to  physiology  and  virulence  of  F.  tularensis.  Two  different  genes  encoding  pilin  subunits  (pi IE 
homologues)  of  a  type  IV  pilus,  are  present  within  the  CR2  and  CR10,  and  fine-structurq,  genome 

comparisons  of  these  CRh0iarctica  mapped  onto  SCHU  S4  are  shown  in  sFig. _ and  sFig. _ ,  respectively. 

The  gene  encoding  the  pilin  subunit  pilE5  (FTT0230c)  [26]  is  embedded  within  CR2.  Another  member 
of  the  pilE  family  is  also  present  in  CR10  and  the  region  upstream  is  disrupted  by  IS  elements.  These  IS 
elements  arc  also  associated  with  disruption  and  duplication  pf  the  FTT13 1  j  gene  in  LVS  as  compared  to 
a  single,  intact  copy  in  SCHU  S4. 
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Because  glycerol  fermentation  has  classically  served  as  a  biochemical  murker  distinguishing  the 


jujarerms.  tindjiolarfica  subspecies,  we  also  scanned  the  CR  for  genes  associated  wjtbalvoerol 
metabolism.  Several  genes  encoding  enzymes  associated  with  glycerol  fermentation  were  found.yathin  _ 
jCRl  1.  CRI  3.  and  CR15.  F ine-structunj  genome  comparisons  of  each  of  these  CRtotatia  mapped  onto 

SCHU  S4  are  shown  in  (Fig. _ or  sFig. _ and/or  sFig. _ ).  'There  is  no  difference  in  the  content  of 

genes  within  CR1 1 .  CRI 3.  and  CRI 5  between  the  two  subspecies,  however  it  is  possible  that  their  unique 
organization  contributes  to  the  different  glycerol  fermentation  phenotypes.  T 
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Discussion: 


Whole  genome  sequencing  has  provided  an  outstanding  resource  for  comparative  genome  studies, 
allowing  high-resolution  snapshots  of  the  genetic  diversity  found  within  a  given  species.  One  of  the 
drawbacks  of  comparative  genome  sequencing,  however,  is  that  only  limited  numbers  of  strains  or  taxa 
can  reasonably  be  compared,  making  it  difficult  to  distinguish  between  strain-level  genomic  differences 
and  true  lineage  or  population-specific  sets  of  genes.  Comparative  genome  hybridization  using  DNA 
microarrays  circumvents  this  problem  to  some  degree  by  providing  a  single  platform  for  comparison  of 
multiple  strains  or  taxa.  On  the  other  hand,  the  array  approach  is  limited  to  assessing  the  diversity  in 
genetic  content  that  is  represented  on  the  array.  J-lere.  we  have shown  that  PESM !  caii i  help  eircunivent  the 
strain  bias  and  the  representation  problems  associated  with  whole  genome  sequencing  and  DNA 
niieroarravs.  PESM  was  originally  developed  as  a  means  to  identify  unique  genomic  islands  [20],  In  our 
application,  we  scaled  PESM  for  comparative  genome  studies.  TGiven  at  least  one  reference  genome 
sequence.  PESM  provides  an  economical  means  to  identify  candidate  regions  of  genomic  difference,  and 
these  regions  can  be  further  examined  in  larger  strain  sets  by  nested  CG-PCR.  The  PESM  library  used  in 
our  study  carried  modest  sized  fragments  of  the  genome  (averaging  14  Kb),  such  that  coverage  could  be 
obtained  with  a  reasonable  amount  of  sequencing  without  severely  limiting  the  ability  to  measure  physical 
size.  PESM  libraries,  however,  can  be  made  using  different  insert  sizes  in  different  types  of  vectors,  such 
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that  coverage  per  clone  can  be  increased  with  larger  segments  while  resolution  can  be  increased  by 
sequencing  a  larger  number  of  segments  from  small-fragment  libraries. 

In  addition  to  economy,  the  PESM  approach  allows  any  strain  to  be  used  as  a  source  of  the 
library,  thereby  allowing  the  user  to  choose  the  best  taxonomic  unit  as  a  reference.  This  is  particularly 
important  when  multiple  subpopulations  of  a  species  may  display  unique  characteristics  that  are  of 
interest.  Indeed,  although  we  have  used  PESM  in  a  binary  comparison  ( holarctica  compared  to 
tularensis),  it  is  possible  to  scaffold  multiple  libraries  into  the  same  PESM  pipeline  using  only  a  single 
reference  genome  upon  which  to  scaffold  the  data. 

Genome  diversity  in  F.  tularensis.  Populations  of  highly  virulent  bacteria  display  a  wide  spectrum  with 
respect  to  genetic  diversity.  On  the  one  hand,  populations  of  species  such  as  Bacillus  anthracis  display 
very  little  diversity  and  only  a  limited  number  of  clones  appear  to  be  spread  worldwide  [27],  These 
clones  can  only  be  differentiated  by  examining  variation  in  tandem  repeats,  which  are  some  of  the  most 
rapidly  evolving  loci  in  the  genome  [28].  With  the  availability  of  multiple  genome  sequences,  single 
nucleotide  polymorphisms  will  soon  complement  or  displace  the  MLVA-based  approaches.  At  the  other 
end  of  the  spectrum  are  subpopulations  of  E.  coli  OI57:H7  which,  despite  the  presence  of  highly  clonal 
signatures  in  their  genomic  backbone,  display  substantial  genomic  diversity,  even  being  detectable  by  a 
relatively  low-resolution  method  as  Pulsed  Field  Gel  Electrophoresis  [29-31]. 

Based  on  the  data  described  in  our  study,  we  believe  that  Francisella  tularensis  may  represent  an 

,  |  Deleted:  represents  an 

intriguing  model  of  genome  evolution.  Previous  studies  of  genetic  diversity  in  F.  tularensis  detected  only  . 
limited  diversity  [11, 32].  The  four/7,  tularensis  subspecies  are  known  to  share  98%  identity  in  their  16S 
rRNA,  show  very  similar  biochemical  profiles,  and  have  quite  similar  antigenic  compositions  [11, 33], 

Only  very  high  resolution  methods  can  provide  any  phylogenetic  signal  that  reasonably  correlates  with 
biochemical  and  virulence  characteristics. 

Despite  the  apparently  limited  degree  of  genetic  diversity,  the  F.  tularensis  subspecies  display 
quite  distinct  geographic  distribution  and  virulence  characteristics.  Thus,  it  was  initially  believed  that 
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although  limited,  the  diversity  in  genomic  content  would  parallel  phylogeographic  and  epidemiologic 
characteristics  and  provide  clues  to  the  genetic  basis  for  these  traits.  With  the  exception  of  differences  in 
numbers  of  pilEAike  loci  [26]  and  the  loss  of  the  pdpD  locus  [2 1  ],  no  other  obvious  candidate  virulence 
genes  [6, 22]  emerged  from  comparative  genome  hybridization  studies  [12, 18, 19],  and  >99%  of  the 
genetic  material  present  in  the  more  virulent  subspecies  tularensis  can  also  be  found  in  the  holarctica 
genome. 

In  the  present  study,  we  now  show  that  despite  the  high  degree  of  genetic  conservation,  the 
genome  organization  of  the  tularensis  and  holarctica  subspecies  is  vastly  different.  At  least  1 7 
substantial  genomic  events  have  occurred  during  divergence  of  these  two  subspecies  and  have  been 
preserved  among  multiple  strains  of  each  population.  The  events  correspond  to  extensive  translocations 
and  rearrangements,  many  if  not  all  of  which  were  mediated  by  movement  of  IS  elements.  As  shown  in 
Fig.  3,  the  IS  elements  are  plentiful  in  the  SCHU  S4  genome,  with  50  different  copies  of  ISftul  and  16 
copies  of  ISftu2  being  distributed  around  the  genome  [22],  Given  the  large  number  of  these  elements,  it 
is  therefore  not  surprising  to  find  them  at  or  near  the  junctions  of  all  1 7  CR.  Certainly,  IS  elements  were 
also  found  abutting  subspecies-specific  regions  of  genomic  difference  (RD)  observed  in  comparative 
genome  hybridization  studies  [  1 8, 1 9],  and  our  data  here  further  confirm  that  IS  elements  are  the  primary 
means  through  which  this  genome  diversifies. 

While  the  degree  of  diversity  in  organization  between  the  genomes  of  subsp.  tularensis  and 
subsp.  holarctica  is  remarkable,  perhaps  equally  remarkable  is  the  degree  to  which  the  unique  structure  is 
preserved  across  temporally  and  spatially  distinct  taxa  of  holarctica  strains.  This  observation  leads  to 
several  interesting  possible  hypotheses.  First,  it  is  possible  that  population  growth  is  very  minimal  such 
that  little  diversity  has  had  time  to  accrue.  However,  because  Francisella  is  free-living  and  is  also 
capable  of  infecting  many  different  mammalian  hosts,  slow  population  turnover  in  the  environment  would 
seem  to  be  an  unlikely  explanation.  A  second  explanation  is  that  IS  elements  move  only  at  a  very  low 
frequency,  thus  generating  diversity  only  on  a  very  slow  timescale.  In  this  instance,  the  divergence  would 
have  been  quite  ancestral  given  the  degree  of  diversity  that  has  accrued.  With  the  number  of  ISftul  and 
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ISftu2  elements  in  the  genome,  this  explanation  is  unsatisfying.  Moreover,  we  detected  significant,,  _ .  Deleted:  more 

diversity  among  the  CRhoiarctica  regions  within  the  subsp.  tularensis  strains,  suggesting  that  the  ISftu  are 
indeed  functional.  Lastly,  and  more  likely,  it  is  also  possible  that  the  extant  populations  of  holarctica  are 
quite  homogenous  because  they  share  very  recent  common  ancestry.  This  hypothesis  would  imply  that 
the  populations  have  recently  been  through  periodic  selection  or  they  arose  from  recent  emergence, 
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The  holarctica  subspecies  is  likely  a  derived  state.  Given  thejhigh  degree  of  virulence  that  is  displayed 
by  the  tularensis  subspecies,  it  has  been  speculated  that  jit  represents  the  ancestral  state  while  the  less 
virulent  subspecies  are  derived  states  [  1 2]  that  are  more  adept  at  infecting  hosts  without  killing.  In 
support  of  this  hypothesis,  the  novicida  subsp.  can  be  found  in  water,  implying  that  it  may  survive  more 
effectively  in  the  free  living  state  than  the  more  highly  virulent  tularensis  subspecies.  Evolutionary 
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analysis  of  VNTR  loci  also  suggest  that  tularensis  is  likely  more  similar  to  the  common  ancestor  [4,  16, 
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17],  Wjjth  respect  to  genome  organization,  our  data  also  support  this  hypothesis,  showing  that 
organization  of  different  CR^h,,  appeal;  to  be  a  derived  state,  arising  by  dissociation of genomic unite  _ 
through  translocation  events  occurring  in  an  immediate  ancestor  of  ihcjiotanica  populations.  At  least 


three  genomic  segments  were  found  to  be  single  contigs  in  the  Jularensis  genome  but  are  dispersed  into 


six  different  CR  in  the  holarctica  subsp.  Moreover,  ^ome  genes  at  the  junctions  of  these  events  are . 
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fragments  being  present  at  the  junctions  of  tularensis.  Furthermore,  the  translocation  events  in  at  least 


two  different  CR  have  separated  functionally  associated  genes  that  are  putatively  involved  with  glycerol 
fermentation  jn  holarctica.  again  their  scattering  being  consistent  with  a  derived  condition.  We  also  note 
that  three  additional  CR  in  tularensis  (CR3-CR9,  CR4-CR8,  and  CR9-CR1 1)  are  adjacent,  but  not 
contiguous  whereas  they  are  highly  dispersed  in  the  holarctica  subspecies.  Therefore,  several  lines  of 
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evidence  are  beginning  to  mount  jn  favor  of  the  hypothesis  that  F.  tularensis  subsp.  tularensis  is  likely 
more  similar  to  the  ancestral  state  while  popn  lations  of  the  Jmhinica subsp.  are  derived  states.  If  this  is 
true,  then  analysis  of  genomic  content  and  organization  between  the  different  subspecies  should  provide 
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insights  not  only  into  virulence  characteristics,  but  also  into  selective  pressures  that  have  led  to  emergence 
and  geographical  spread  of  th ejiolarlica  populations.  T 


Two  US  strains  may  represent  a  unique  taxanomic  unit  within  Francisella.  Although  our  search  was 
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primarily  focused  on  identifying  population-specific  regions  of  genomic  difference,  the  genome 


organization  observed  in  the  subsp.  tularensis  strains  WY00W41 14  and  WY-WSLVL02  is  very 


intriguing.  Their  pattern  of  genome  organization  is  clearly  distinct  from  the  tularensis  and  holarctica 


populations,  sharing  CR  “alleles”  at  some  loci  with  tularensis  strains,  CR  “alleles”  at  other  loci  with 
holarctica  strains,  and  unique  alleles  at  still  other  loci.  The  fact  that  some  C'R  alleles  are  shared  with  both 
subspecies  suggests  three  different  possibilities.  First,  the  strains  could  have  descended  from  a 
recombination  event  in  which  DMA  was  acquired  from  one  subspecies  and  recombined  into  another. 

Given  that  the  distribution  of  the  CRholartica  “alleles”  present  in  these  strains  is  highly  dispersed. 
multiple  such  recombination  events  would  have  to  he  invoked  to  account  for  the  genomic  organization.  A 
second  explanation  is  that  the  CRholartica  in  these  strains  are  homoplnsious.  arising  through  independent 
events  within  a  separate  lineage  of  descent  from  the  common  ancestor.  This  explanation  is  also 
unsatisfying  in  that  multiple  such  events  must  be  invoked,  each  of  which  occurred  similarly  in  at  least  two 
different  lineages.  Lastly,  it  is  possible  that  these  strains  arc  derived  from  an  evolutionary  intermediate. 
and  thus  only  some  of  the  CT^mjaJiawjcciirrcd  and  are  preserved.  This  explanation  invokes  the  fewest 
number  of  events  and  requires  that  the  events  only  occur  in  a  single  lineage.  Thus,  we  favor  the 
evolutionary  intermediate  explanation.  Regardless  of  the  explanation  for  the  events,  the  unique 
combination  of  CR  in  these  strains  implies  that  populations  represented  by  these  strain  should  have 

distinct  taxonomic  status. . It  will  be  interesting  to  conduct  further  analyses  on  additional  isolates  to 

determine  if  the  evolutionary  intermediate  model  holds  up.  , 
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Figure  1.  Cumulative  count  of  CK  contigv  as  a  function  of  paired-end  reads.  t 
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